Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Jul 24, 2025

I think the CentroidQueryScorer leaks a lot of implementation details on our search strategy. we only want an object that feeds the algorithm with the centroids to be searched so I think it make sense to change the CentroidQueryScorer interface with a CentroidIterator interface. In this case, the details of how we build the iterator is hidden for the search strategy.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 24, 2025
@tteofili
Copy link
Contributor

while I agree on the rationale, the CentroidQueryScorer centroids' scores might be useful for further optimizations. This is currently not possible anyway because the scores are hidden in the NeighborQueue, but I would like to see whether it's possible to have those <centroid, score> pairs available somewhere for e.g., IVFVectorsReader to use them.
e.g., I have a rough patch where I explicitly store them in a List for further consumption.

@iverase
Copy link
Contributor Author

iverase commented Jul 24, 2025

@tteofili you can use #topScore() to get the current score, why would you want to store those scores on a list?

@iverase
Copy link
Contributor Author

iverase commented Jul 24, 2025

@tteofili I had a look into your patch and you can add the logic when computing the iterator and add a method to the iterator API called something like getRecommendedNprobe or something of those sorts.

@tteofili
Copy link
Contributor

@tteofili I had a look into your patch and you can add the logic when computing the iterator and add a method to the iterator API called something like getRecommendedNprobe or something of those sorts.

yes, thx, I can do something along those lines although it might sound a bit weird for such an iterator to have those optimization responsibilities. not 100% sure though

@iverase
Copy link
Contributor Author

iverase commented Jul 24, 2025

Still your optimization considers that we will always score all the centroids, if we ever add a parent layer to the centroids, then your optimization will not work. I think we should not assume we have all the scores for the centroids.

@tteofili
Copy link
Contributor

tteofili commented Jul 24, 2025

I'll open an issue as soon as I have more confidence that doing anything like that is useful to speedup search (e.g., for multi-segment), and we can discuss that there.
other than that, I am fine turning the CentroidQueryScorer into an iterator.

Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, other than an inconsistency between code and comment.

Comment on lines 113 to 114
// Here we assume `lx` is simply bit vectors, so the scaling isn't necessary
float lx = (targetCorrections[1] - ax) * FOUR_BIT_SCALE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed this while going through the code with @john-wagster , the comment says scaling isn't necessary but then we do scaling, either the comment is outdated or we have a bug (seen it in other parts of the code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy paster error in the comment

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this. No concerns. I think it's a good clean up. My only thought is the centroid 1bit quantization / rescoring work is going well and will probably be beneficial and is definitely going to conflict heavily with this change. And that work will likely benefit more so from this clean up. So may be worth considering waiting till early next week when that work hopefully goes in?

@iverase
Copy link
Contributor Author

iverase commented Jul 24, 2025

I spoke to @wags and we agree that this is a good step forward, so merging.

@iverase iverase merged commit 4eaa2ea into elastic:main Jul 24, 2025
33 checks passed
@iverase iverase deleted the centroidIterator branch July 24, 2025 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants